Bohrium: Unmodified NumPy Code on CPU, GPU, and Cluster
نویسندگان
چکیده
In this paper we introduce Bohrium, a runtimesystem for mapping array-operations onto a number of different hardware platforms, from multi-core systems to clusters and GPU enabled systems. As a result, the Bohrium runtime system enables NumPy code to utilize CPU, GPU, and Clusters. Bohrium integrates seamlessly into NumPy through the implicit data parallelization of array operations, which are called Universal Functions in NumPy. Bohrium requires no annotations or other code modifications besides changing the original NumPy import statement to: “import bohrium as numpy”. We evaluate the presented design through a setup that targets a multi-core CPU, an eight-node Cluster, and a GPU, all implemented as preliminary prototypes. The evaluation includes three well-known benchmark applications, Black Sholes, Shallow Water, and N-body, implemented in Python/NumPy.
منابع مشابه
Theano: A CPU and GPU Math Compiler in Python
Theano is a compiler for mathematical expressions in Python that combines the convenience of NumPy’s syntax with the speed of optimized native machine language. The user composes mathematical expressions in a high-level description that mimics NumPy’s syntax and semantics, while being statically typed and functional (as opposed to imperative). These expressions allow Theano to provide symbolic ...
متن کاملDoubling the Performance of Python/NumPy with less than 100 SLOC
A very simple, and outside NumPy, commonly used trick of buffer-reuse is introduced to the NumPy library to speed up the performance of scientific applications in Python/NumPy. The implementation, which we name software victim-caching, is very simple. The code itself consists of less than 100 lines of code, and took less than one day to add to NumPy, though it should be noted that the programme...
متن کاملA Complete Descritpion of the UnPython and Jit4GPU Framework
A new compilation framework enables the execution of numerical-intensive applications in an execution environment that is formed by multi-core Central Processing Units (CPUs) and Graphics Processing Units (GPUs). A critical innovation is the use of a variation of Linear Memory Access Descriptors (LMADs) to analyze loop nests and determine automatically which memory locations must be transferred...
متن کاملHigh-performance computing tools for advancing the integrated assessment and modelling of global environmental challenges
Integrated assessment and modelling of complex social-ecological systems is required to address global environmental challenges such as climate change, food and energy security, natural resource management, and biodiversity conservation. Assessments need to capture high spatial and temporal resolution, cover large geographic extents, and quantify uncertainty. This places high computational dema...
متن کاملA New Compilation Path: From Python/NumPy to OpenCL
Jit4OpenCL is a new compiler that converts scientific applications written in Python/NumPy into OpenCL code. This compiler is based on unPython, an ahead-of-time compiler from Python/Numpy to an intermediate form and OpenMP code, and on jit4GPU, a just-in-time compiler that converts that intermediate code into AMD CAL code that is specific for AMD GPUs. The targeting of OpenCL provides a new ev...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013